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Abstract 

Preliminary  study  on  Ipomoea  phylogeny  shows  that  species  selection  may  influence 
flower-color  evolution  in  this  genus.  Evolutionary  transitions  between  pigmented 
flower  and  white  flower  appear  to  be  highly  asymmetrical.  White-flowered  lineages 
are  less  speciose  than  their  pigmented  sister  clade  and  evolutionary  transitions  to 
white  flowers  occurred  almost  exclusively  at  the  tip  of  phylogeny.  A  hypothesis  to 
explain  this  pattern  is  that  species  selection  acts  against  white-flowered  lineages  and 
prevents  them  from  increasing  in  relative  abundance  whiten  the  genus  Ipomoea.  A 
continuous-time  Markov  model  is  developed  to  test  differential  speciation  rates  within 
genus  Ipomoea.  Simulation  result  shows  that  the  observed  distribution  of  white 
lineages  is  not  compatible  with  equal  transition  rates  between  pigmented  flower  and 
white  flower.  Only  highly  asymmetrical  transition  rates  that  favor  transitions  to  white 
flower  are  compatible  with  the  observed  distribution.  This  result  implies  that  there  is  a 
tendency  to  increase  the  proportion  of  white-flowered  species  in  the  genus.  The  only 
process  that  could  prevent  this  outcome  is  species  selection  or  some  similar  higher- 
level  evolutionary  process  that  acts  against  white-flowered  lineages.  The  absence  of 
any  deep  white-flowered  clades  suggests  that  it  might  be  white-flowered  species  have 
greatly  elevated  extinction  rates  compared  to  pigmented  species. 
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Introduction 


It  has  long  been  recognized  that  natural  selection  may  operate  at  different 
levels  of  organization  (Lewontin  1970,  Wimsatt  1980,  Brandon  1982,  Sober 
and  Wilson  1994).  Evidence  has  accumulated  indicating  that  besides  an 
individual,  the  unit  of  selection  may  be  a  gene  (Werren  et  al.  1988),  a  family 
group  (McCauley  1994),  and  possibly  a  population  (Wade  1977).  More 
controversial  is  the  notion  that  selection  may  operate  at  higher  levels  of 
organization,  particularly  among  lineages  or  species.  Recently,  phylogenetic 
approaches  have  been  used  to  detect  species  selection.  Statistical  analyses  of 
phylogenetic  reconstructions  can  reveal  whether  clades  differ  in 
extinction/speciation  rates  and  whether  particular  traits  are  associated  with 
these  differences  (Mitter  et  al.  1988,  Farrell  et  al.  1991,  Nee  et  al.  1992, 
Slowinski  and  Guyer  1993,  Sanderson  and  Donoghue  1994,  1996).  The  goal 
of  this  project  is  to  examine  the  role  of  species  selection  in  influencing  the 
evolution  of  flower  color  in  morning  glories  (Ipomoea). 

The  pantropical  genus  Ipomoea  L.  consists  of  more  than  600  species  (Austin 
and  Huaman  1996)  of  vines,  shrubs,  and  trees  that  exhibit  a  tremendous 
diversity  of  flower  colors.  Miller  et  al.  (1999)  published  a  phylogeny  of  40 
Ipomoea  species  using  sequences  of  the  internal  transcribed  spacer  (ITS) 
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region  of  nuclear  ribosomal  DNA  and  sequences  from  three  exons  and  two 
introns  of  the  3'  end  of  nuclear  gene  waxy.  For  these  analyses,  flower  colors 
were  grouped  into  two  categories:  pigmented  and  white.  Three  interesting 
patterns  emerge: 

Evolutionary  transitions  appear  to  be  highly  asymmetrical.  In  particular, 
parsimony  character-state  reconstruction  of  Fig.  1  reveals  at  least  seven, 
possibly  9,  independent  transitions  from  pigmented  to  white  flowers.  By 
contrast,  there  are  at  most  two,  and  possibly  no,  transitions  in  the  reverse 
direction.  (I.  Wrightii,  I.  Graminea,  and  I.  imp e rati  clade  may  represent 
reversion.)  This  pattern  suggests  that  loss  of  floral  pigmentation  may  be  a 
manifestation  of  Dollo's  law  that  the  loss  of  complex  characters  is  irreversible 
(Dollo  1893,  Bull  and  Charnov  1985,  Sanderson  1993). 

There  is  a  tendency  for  white-flowered  lineages  to  be  less  speciose  than  their 
pigmented  sister  clade.  In  particular,  three  white-flowered  lineages  (/. 
Saintronanensis  and  the  mostly  white-flowered  clade  consisting  of  /. 
graminea,  I.  wrightii,  and  /.  imperati)  are  substantially  less  speciose  than  their 
pigmented  sister  clades,  while  no  white-flowered  lineages  are  more  speciose 
than  their  sister  clade.  The  sister  taxon  of  each  remaining  white-flowered 
species  is  a  single  pigmented  species. 
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Evolutionary  transitions  to  white  flowers  occurred  almost  exclusively  at  the  tip 
of  the  phylogeny.  All  white  clades  are  two  nodes  deep  or  less,  with  most 
represented  by  a  single  species. 

The  strong  asymmetry  in  transitions  between  pigmented  and  white  flowers  can 
be  explained  by  our  current  knowledge  of  the  genetic  and  molecular  control  of 
flower  color.  In  Ipomoea,  as  in  many  plants,  floral  pigments  are  anthocyanins, 
which  are  produced  from  the  precursors  malonyl-CoA  and  p-coumarl-CoA  by 
a  biochemical  pathway  consisting  of  six  core  enzymes  (Dooner  et  al.  1991, 
van  der  Meer  et  al.  1993,  Holton  and  Cornish  1995).  Analysis  of  spontaneous 
and  induced  white-flowered  mutants  in  plants,  such  as  Petunia  and 
snapdragon  {Antirrhinum),  has  revealed  that  in  all  cases,  white  flowers  arise 
whenever  one  or  more  structural  or  regulatory  genes  of  this  pathway  are 
deactivated  (Forkmann  and  Dangelmayr  1980,  Harker  et  al.  1990,  Inagaki  et 
al.  1994,  1996,  Hisatomi  et  al.  1997,  Quattrocchio  et  al.  1998,  Mol  et  al. 
1998).  Similarly,  naturally  occurring  white-flowered  species  and  variants, 
including  two  in  Ipomoea,  (Ennos  and  Clegg  1983,  Epperson  and  Clegg  1988) 
also  appear  to  be  due  to  knockout  mutations  (Tiffin  et  al.  1998,  Habu  et  al. 
1998,  Quattrocchio  1994). 
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This  evidence  indicates  that  white  flowers  are  most  likely  to  arise  genetically 
by  loss-of-function  mutations  affecting  specific  anthocyanin  pathway  genes.  If 
this  pattern  holds  upon  further  investigation,  it  provides  a  ready  explanation 
for  the  asymmetry  in  flower-color  transitions:  deactivating  mutations  are 
relatively  common,  whereas  restoration  of  function  is  rare  because,  once 
deactivated,  a  gene  will  acquire  additional  deactivating  mutations  through 
genetic  drift.  These  additional  mutations  will  reduce  the  probability  that  a 
single  mutation  could  restore  function  (but  see  Hanson  et  al.  1996). 

The  apparent  strong  asymmetry  in  the  direction  of  flower-color  transition,  if 
real,  constitutes  a  process  that  would  tend  to  increase  the  proportion  of  white- 
flowered  species  in  the  genus.  If  unopposed  by  other  processes,  it  would 
eventually  produce  a  genus  in  which  most  species  have  white  flowers. 
Because  the  asymmetry  results  from  the  integration  of  all  evolutionary 
processes  acting  below  the  species  level,  the  only  process  that  could  prevent 
this  outcome  is  species  selection  or  some  similar  higher-level  evolutionary 
process.  Moreover,  the  observation  that  transitions  to  white  flowers  occur 
almost  exclusively  at  the  tips  of  our  phylogeny  suggests  that  white-flowered 
species,  for  unknown  ecological  reasons,  have  greatly  elevated  extinction 
rates,  compared  to  pigmented  species,  though  the  concomitant  existence  of 
decreased  speciation  rates  in  white-flowered  lineages  can  not  be  ruled  out.  The 
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overall  goal  of  the  project  is  to  test  the  hypothesis  that  species  selection  acts 
against  white-flowered  lineages  and  prevents  them  from  increasing  in  relative 
abundance. 
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Systems 
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Fig.  1  Strict  consensus  tree  for  Ipomoea  based  on  simultaneous  analysis  of  ITS 
and  waxy  sequences  (Miller,  1999).  *  indicate  white  flower  species. 
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Methods 


In  order  to  test  the  asymmetry  of  Ipomoea  phylogeny,  we  developed  the 
following  method.  A  set  of  equally  most  parsimonious  phylogenies  is 
generated.  One  phylogeny  is  randomly  selected  for  analysis  and  linearized. 
Branch  lengths  are  then  estimated  by  maximum  likelihood  under  the 
assumption  for  a  molecular  clock.  Based  on  this  linearized  phylogeny,  the 
distribution  of  white-flowered  species  is  reconstructed  using  a  continuous-time 
Markov  model.  Using  this  model,  1000  trees  with  random  flower-color 
transitions  will  be  generated  stochastically  for  each  phylogeny.  We  choose  R 
(number  of  white  species/total  length  of  white  lineages)  and  S  (number  of 
white  species)  to  be  the  two  statistics  to  test  asymmetrical  distribution  of 
white-flowered  and  pigmented  flowered  species  in  phylogeny.  This  procedure 
is  repeated  for  a  number  of  randomly-chosen  equally  parsimonious  topologies 
with  a  set  of  transition  rates  between  white  flowered  species  and  pigmented 
flowered  species.  It  yields  a  distribution  of  joint  probabilities  of  R  and  S.  The 
statistics  are  also  calculated  for  the  actual  tree  and  compared  to  the  frequency 
distribution  of  the  statistics  among  the  simulations.  If  fewer  than  five  percent 
of  the  statistics  from  the  stochastic  simulations  are  as  extreme  as  that  for  the 
actual  tree,  the  stochastic  model  may  be  rejected  as  incompatible  with  the 
observed  pattern. 
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(1)  Phylogeny  reconstruction. 


The  phylogeny  of  40  species  was  analyzed  using  the  sequence  of  the  internal 
transcribed  spacer  (ITS)  region  of  nuclear  ribosomal  DNA  and  sequences  for 
three  exons  and  two  introns  of  the  3'  end  of  the  nuclear  gene  waxy.  Merremia 
tuberosa  and  Operculina  brownii  are  used  as  outgroups.  Combined  ITS  and 
waxy  sequence  has  1285  nucleotides.  The  phylogeny  was  reconstructed  by 
PAUP  using  parsimony.  Among  the  1285  sites,  249  sites  are  parsimony- 
informative.  Finally,  we  obtained  56  equally  most-parsimonious  phylogenies. 

(2)  Generate  linearized  phylogeny  and  estimate  branch  length  by  maximum 
likelihood  method. 

Randomly  select  one  maximum  parsimonious  phylogeny  for  analysis. 
Takezaki  et  al.  (1995)  proposed  a  method  to  sequentially  prune  species 
exhibiting  significant  deviation  from  the  average  nucleotide  substitution  rate 
until  rate  variation  among  the  remaining  species  is  consistent  with  a  molecular 
clock.  Their  method  is  called  two-cluster  test.  The  principle  of  the  test  is  to 
examine  the  equality  of  the  average  substitution  rate  for  two  clusters  that  are 
created  by  a  node  in  a  given  tree.  The  basic  quantities  they  care  about  are  the 
averages  of  observed  numbers  of  substitutions  per  site  from  node  to  the  tips  of 
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the  two  clusters.  Under  the  assumption  of  rate  constancy,  the  expected 
difference  of  the  two  clusters'  substitution  rates  is  zero.  If  the  test  shows 
significant  rate  difference  between  the  two  clusters,  they  eliminate  the  cluster 
whose  average  branch  length  from  the  root  is  more  different  from  the  average 
of  all  sequences  than  the  other  cluster.  This  test  and  sequence  elimination 
therefore  proceeds  from  terminal  nodes  to  the  root  of  the  tree.  A  linearized  tree 
will  then  be  constructed  for  a  given  topology  under  the  assumption  of  rate 
constancy.  Takezaki  has  written  a  program  package  to  generate  linearized  tree. 
I  used  this  package  to  linearize  phylogenies  in  this  project. 

Branch  lengths  are  then  estimated  by  maximum  likelihood  method  under  the 
assumption  of  a  molecular  clock.  Yang  (1997)  wrote  a  package  of  programs, 
called  PAML,  for  phylogenetic  analyses  of  DN A  sequences  using  the 
maximum  likelihood  method.  General  assumptions  of  the  programs  are  that 
substitutions  occur  independently  in  different  lineages  and  among  sites,  the 
process  of  substitution  is  a  time-homogeneous  Markov  process,  and  the 
frequencies  of  nucleotides  have  remained  constant  over  the  time  period 
covered  by  data.  I  used  this  program  package  to  obtain  branch  lengths  of  trees 
in  this  project. 

(3)  Simulate  transitions  between  pigmented  flower  and  white  flower. 
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Pagel  (1994)  developed  a  continuous-time  Markov  model  for  character 
evolution.  The  model  is  designed  to  estimate  simultaneous  transition  rates  in 
pairs  of  binary  characters  arrayed  on  a  phylogney.  All  species  in  the  study 
could  have  only  one  of  two  characters,  in  our  case,  a  pigmented  flower  or  a 
white  flower.  I  assign  it  to  be  either  P  state  (pigmented  flower)  or  W  state 
(white  flower).  Along  a  branch  beginning  with  state  P/W,  the  flower  color  will 
either  remain  the  same  or  change  to  the  other  state.  The  probability  for  this 
change  or  unchange  over  time  t  (t  is  the  branch  length)  is  Ppp(t),  Pww(t),  Pwp(0, 
or  PpW(t).  Assume  that  the  flower  color  evolves  according  to  a  Markov  process 
such  that  the  probability  of  change  is  independent  in  each  branch  of  the 
phylogenetic  tree,  and  that  the  probability  of  changing  from  one  state  to  the 
next  depends  only  upon  the  state  at  the  beginning  of  the  branch,  not  on  events 
previous  to  that.  The  probability  that  flower  color  changes  from  state  i  to  state 
j  over  time  interval  t+dt  is  given  by 

Pij(t+JO=Pii(t)*qij*^+Pij(t)*(l-qij)*^ 
where  t  is  an  arbitrary  time  period,  dt  is  a  short  interval  of  time,  and  q,j  is 
transition  rate  parameter  denoting  the  rate  of  change  from  state  i  to  state  j  over 
an  infinitesimally  short  time  interval.  The  1-qy  term  represents  the  probability 
of  staying  in  state  j.  This  equation  states  that  flower  color  can  change  from 
state  i  to  state  j  either  by  remaining  unchanged  for  a  period  t  followed  by  a 
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transition  to  state  j,  or  by  changing  to  state  j  during  time  period  t,  then 
remaining  in  state  j  over  period  dt.  The  set  of  equations  describing  the  four 
possible  probabilities  of  interest  can  be  presented  in  matrix  form 

?(t+dt)=?{t)*(I+Qdf), 
|  l-Ppw(t+dr)  Ppw(t+A)|  =  I  Ppp(t)  PPw(t)|  *  |(l-qpw)*A  q^*dt\ 
|Pwp(t+Jr)  1-Pwp(t+Jr)  |      |  Pwp(t)  Pww(t)  |    |qwp(t)*rff  (l-qwp)*^| 

To  solve  for  the  P(t)  in  terms  of  the  rate  parameters  in  Q, 
dP(t)/<fr=P(t)*Q, 
P(t)=exp(Qt), 

P(t)  is  only  a  function  of  the  transition  rates  between  pigmented  flower  and 
white  flower. 

Each  simulation  begins  with  a  pigmented  flowered  species.  Two  transition 
rates  are  assigned  at  the  beginning  of  the  simulation.  Along  each  branch,  there 
is  an  expected  transition  probability,  P(t),  calculated  based  on  the  continuous- 
time  Markov  model.  The  program  could  generate  random  numbers  uniformly 
distributed  in  (0,1).  By  comparing  the  random  number  to  the  expected  P(t),  the 
stochastic  model  decides  whether  flower  color  would  change  along  the  branch. 
Beginning  with  a  pigmented  flower  species  on  the  root  of  a  tree,  it  would 
finally  generate  a  white  flower  species  distribution  on  the  tips  of  the  tree  based 
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on  the  given  phylogeny.  By  assigning  different  transition  rates,  I  could  obtain 
distributions  of  white-flowered  species  corresponding  to  different  transition 
rates. 

(4)  Choice  of  test  statistics 

The  primary  goal  of  this  project  is  to  develop  a  statistical  test  for  whether 
white  lineages  are  confined  to  the  tips  of  robustly  supported  clades,  which 
would  imply  higher  rates  of  extinction  (or  lower  rates  of  speciation)  in  white- 
flowered  lineages.  We  want  to  find  a  statistic  which  could  capture  the  degree 
to  which  white-flowered  lineages  are  confined  to  the  tips  of  the  phylogeny. 
This  test  statistic  should  reflect  two  properties  that  are  negatively  correlated:  if 
the  probability  of  transition  from  pigmented  to  white  is  low,  S,  the  number  of 
white  species  will  be  low,  but  the  ratio  R=(number  of  white  species)/(total 
length  of  white  lineages)  will  be  high;  by  contrast,  if  that  transition  probability 
is  high,  the  number  of  white  species  will  be  high,  but  R  will  be  low.  Because  a 
single  test  statistic  is  not  likely  to  capture  this  relationship,  we  believe  that  a 
bivariate  plot  of  R  vs.  S  will  provide  a  useful  test  of  whether  the  observed 
distribution  of  the  white  lineages  differs  from  what  expected  under  the 
stochastic  model. 
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In  the  simulation,  the  program  could  remember  on  which  branch  the  transtion 
between  white  flower  and  pigmented  flower  happened  and  the  character  state 
(white  or  pigmented  flower)  of  each  node.  If  a  transition  from  pigmented 
flower  to  white  flower  happens  on  a  branch,  I  assume  the  transition  happened 
on  the  base  of  the  branch.  When  calculating  the  total  length  of  white  lineages, 
I  trace  back  each  white  flowered  species  to  its  oldest  ancestor  whose  character 
state  is  white  flowered.  The  total  branch  length  between  that  ancestor  and  the 
white  flowered  species  is  the  length  of  this  white  lineage.  The  total  length  of 
white  lineages  is  the  summation  of  length  of  all  white  lineages  subtracting  the 
branches  which  are  counted  repeatedly. 

(5)  Find  the  probability  contour  which  covers  95%  of  the  simulated  values. 

For  each  phylogeny,  repeat  the  stochastic  simulations  described  above  for 
1000  times.  It  will  yield  a  joint  probability  distribution  of  R  and  S.  On  the 
boarder  of  the  region  that  covers  all  simulated  values,  the  point  which  has  the 
minimum  joint  probability  will  be  cut  off.  This  process  is  continued  until  the 
remaining  region  encloses  95%  of  the  joint  probabilities.  It  will  finally 
generate  a  set  of  probability  contours  corresponding  to  different  levels  of 
transition  rate. 
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Steps  (2)-(5)  are  then  repeated  for  56  equally  most  parsimonious  trees 
generated  in  step  A.  This  will  yield  an  average  distribution  of  joint  probability 
of  R  and  S.  We  believe  this  mean  can  be  taken  as  a  reasonable  estimation  of 
overall  probability.  The  statistic,  R  and  S,  is  also  calculated  for  the  reference 
tree  (Fig.  1).  Since  the  reference  tree  is  probably  the  one  most  close  to  the 
actual  tree,  the  reference  tree's  R  and  S  are  taken  to  be  observed  R  and  S.  If 
the  observed  values  of  these  two  variables  fall  outside  the  probability  contour 
that  encloses  95%  of  the  simulated  values,  the  observed  distribution  of  white 
lineages  can  be  taken  as  incompatible  with  the  corresponding  transition  rates. 
There  are  two  cases  we  want  to  test:  equal  transition  rate  between  pigmented 
and  white  flowers  or  a  rate  of  zero  for  gain  of  pigmentation. 
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Result 


Among  the  42  species  we  studied,  9  species  are  white-flowered.  They  are  /. 
obscura,  I.  graminea,  I.  imperati,  I.  diamantinensis,  I.  alba,  I.  arborescens,  I. 
saintronaensis,  I.  lacunosa,  and  /.  pandurata.  The  branch  length  of  Fig.  1  is 
also  estimated  by  PAML.  The  observed  R  ratio  (number  of  white 
species)/(totally  length  of  white  lineages)  is  63.87.  The  actual  S  (number  of 
white  species)  is  9. 

After  linearization,  some  of  the  most  parsimonious  trees  became  the  same.  I 
finally  obtained  a  total  of  18  linearized  trees.  The  largest  tree  has  28  species, 
while  the  smallest  one  has  17  species.  The  18  linearized  trees'  branch  lengths 
were  then  estimated  by  PAML.  Simulations  were  done  along  all  linearized 
trees  and  the  original  most  parsimonious  trees  as  well.  The  transition  rate  from 
pigmented  flower  to  white  flower  is  set  to  be  from  1  to  45  (=100*qpw,  refer  to 
Methods  section),  while  the  transition  rate  from  white  flower  to  pigmented 
flower  is  set  to  be  from  0  to  45  (=100*qwp).  Because  all  simulations  begin  with 
a  pigmented  flower  species  on  the  root  of  a  tree,  the  transition  rate  from 
pigmented  flower  to  white  flower  can  not  be  zero. 
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The  result  is  presented  in  3  sections.  Fig.  2  and  Fig.  3  are  three  dimensional 
graphs  showing  the  relationships  between  R  and  the  two  transition  rates.  X 
axis  is  the  transition  rate  from  white  flower  to  pigmented  flower,  Y  axis  is  the 
transition  rate  from  pigmented  flower  to  white  flower,  and  Z  axis  is  average  R 
value  of  1000  simulations.  Fig.  2  is  the  average  simulation  result  of  all 
linearized  trees.  Fig.  3  is  the  average  simulation  result  of  all  most 
parsimonious  trees.  Both  Fig.  2  and  Fig.  3  are  separated  by  the  number  of 
white  species.  From  Fig.  2,  we  see  that  in  all  4  cases,  simulation  tends  to  get 
big  R  when  transition  rate  from  white  flower  to  pigmented  flower  is  low  and 
transition  rate  from  pigmented  flower  to  white  flower  is  high.  R  value 
decreases  along  with  increasing  transition  rate  from  white  flower  to  pigmented 
flower  and  decreasing  transition  rate  in  reverse  direction.  Among  the  4  cases, 
when  the  number  of  white  species  is  small,  R  tends  to  distribute  in  a  large 
range,  from  about  75  to  20  in  Fig.  2.1,  and  when  the  number  of  white  species 
is  big,  R  distributes  in  a  small  range,  from  60  to  40.  Both  Fig.  2  and  Fig.  3 
show  the  same  pattern.  In  both  Fig.  2.1  and  Fig.  3.1,  R  approaches  the 
observed  value  of  63.87  only  when  rate  of  transition  from  pigmented  flower  to 
white  flower  is  high  and  reverse  transition  rate  is  low. 

Fig.  4  and  Fig.  5  are  also  three  dimensional  graphs  showing  the  relationships 
between  R  and  the  two  transition  rates.  X  and  Y  axies  are  still  the  same  with 
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Fig.  2  and  Fig.  3,  while  Z  axis  is  average  percentage  of  R  value  which  is 
bigger  than  the  observed  R  in  1000  simulations.  Fig.  4  is  the  average 
simulation  result  of  all  linearized  trees.  Fig.  5  is  the  average  simulation  result 
of  all  most  parsimonious  trees.  Again,  both  Fig.  4  and  Fig.  5  are  separated  by 
number  of  white  species.  From  Fig.  4,  we  see  that  in  all  4  cases,  when 
transition  rate  from  white-flowered  species  to  pigmented-flowered  species  is 
low  and  transition  rate  in  reverse  direction  is  high,  we  could  expect  to  see 
more  than  5  percent  of  simulations  get  R  bigger  than  observed  R.  When  the 
two  transition  rates  are  equal  or  transition  rate  from  white-flowered  species  to 
pigmented-flowered  species  is  bigger  than  the  transition  rate  in  reverse 
direction,  almost  no  simulations  get  an  R  value  bigger  than  the  observed  R. 
Fig.  5  shows  the  same  pattern.  Again,  Fig.  4  and  Fig.  5  show  range  of  R  tends 
to  shrink  along  with  the  increasing  of  number  of  white  species. 

Fig.  6  shows  the  contours  that  cover  95%  of  joint  probabilities  of  R  and  S.  X 
axis  is  S,  number  of  white  species.  Y  axis  is  R,  (number  of  white 
species)/(total  length  of  white  lineages).  The  little  triangle  indicates  the 
observed  R  and  S.  Only  in  Fig.  6.2  and  Fig.  6.3,  the  contour  covers  the 
observed  values.  In  these  two  cases,  the  transition  rate  from  white  flower  to 
pigmented  flower  is  much  less  than  the  transition  rate  in  reverse  direction. 
Another  pattern  we  could  see  from  Fig.  6  is  that  along  the  diagonal  from  upper 
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right  corner  to  bottom  left  corner,  transition  probability  from  pigmented 
flower  to  white  flower  decreases  from  very  big  to  very  small,  along  this 
change,  R  tends  to  decrease  and  S  tends  to  increase. 


1  8 


Discussion 


From  the  three  figures,  Fig.  4,  Fig.  5,  and  Fig.  6,  we  can  see  the  null 
hypothesis  of  equal  transition  rate  is  rejected.  The  observed  pattern  is  only 
compatible  with  a  high  transition  rate  from  pigmented  flowered  to  white 
flower  and  low  transition  rate  from  white  flower  to  pigmented  flower.  This 
confirms  our  original  hypothesis  that  deactivating  mutations  are  relatively 
common,  whereas  restoration  of  function  is  rare.  Since  this  strong  asymmetry 
in  the  direction  of  flower-color  transition  constitutes  a  process  that  would  tend 
to  increase  the  proportion  of  white  flower  species  in  the  genus,  there  must  be  a 
process,  such  as  species  selection  or  some  similar  higher-level  evolutionary 
process  that  could  prevent  this  outcome.  This  indicates  a  higher  rate  of 
extinction  or  lower  rate  of  speciation  in  white-flowered  lineages.  The  absence 
of  deep  white-flower  clades  suggests  that  white-flowered  species  might  have 
greatly  elevated  extinction  rates  compared  to  pigmented  species. 

From  Fig.  2  to  Fig.  5,  we  see  that  the  range  of  R  tends  to  decrease  when  S 
increases.  This  is  because  when  the  number  of  white  species  is  small,  white 
species  could  be  on  either  short  terminal  branches  or  long  terminal  branches, 
so  the  variance  of  total  length  of  white  lineages  tends  to  be  large;  when  the 
number  of  white  species  increases,  for  example,  25  out  of  42  species  are 
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white-flowered  species,  the  white  lineages  will  trace  back  to  the  root  of  the 
tree  and  total  length  of  white  lineages  does  not  change  very  much.  So  R  is 
constricted  in  a  small  range. 

Fig.  6  shows  an  interesting  result.  Fig.  6.3  indicates  that  when  transition  rate 
from  pigmented  flower  to  white  flower  is  much  higher  than  transition  rate 
from  white  flower  to  pigmented  flower,  the  average  number  of  white  species 
is  small,  around  13.  While  Fig.  6.7  shows  that  when  transition  rate  from 
pigmented  flower  to  white  flower  is  much  less  than  transition  rate  in  reverse 
direction,  the  average  number  of  white  species  is  big,  around  25.  This 
contradicts  the  expectation  that  higher  transition  probability  from  pigmented 
flower  to  white  flower  will  generate  a  bigger  number  of  white-flowered 
species. 

Intuitively,  we  expect  to  see  higher  transition  rate  from  pigmented  flower  to 
white  flower  will  result  in  increased  mean  number  of  white  flowered  species. 
But  in  Fig.  6,  as  the  rate  of  transition  from  white  flower  to  pigmented  flower 
increases  (e.g.  going  down  a  column),  the  mean  number  of  white  species 
increases  and  as  the  rate  of  transition  from  pigmented  to  white  increases  (e.g. 
going  across  a  row),  the  mean  number  of  white-flowered  species  decreases. 
When  the  transition  rate  from  white  flower  to  pigmented  flower  is  fixed, 
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increasing  the  transition  rate  from  pigmented  flower  to  white  flower  (going 
across  a  row)  will  increase  the  probability  of  transition  from  pigmented  flower 
to  white  flower  on  each  single  branch.  However,  the  small  number  of  white 
flowered  species  is  more  likely  to  be  due  to  multiple  transitions  on  the  external 
branch,  while  the  big  number  of  white  flowered  species  is  more  likely  to  be 
due  to  a  single  transition  on  the  internal  branch  which  is  close  to  the  root  of 
the  phylogeny.  When  the  probability  of  transition  on  each  single  branch  is 
increased,  it  will  benefit  the  first  case  more  than  the  second  one,  since  more 
branches  are  involved  in  the  first  case.  Multiple  origin  of  white  flowered 
species  on  external  branches  will  gain  more  weight.  This  will  drag  down  the 
mean  number  of  white  flowered  species. 

When  the  transition  rate  from  pigmented  flower  to  white  flower  is  fixed, 
increasing  the  transition  rate  from  white  flower  to  pigmented  flower  (going 
down  a  column)  will  increase  the  probability  of  losing  a  white  flowered 
species  on  each  single  branch.  In  order  to  clean  up  a  pigmented  flower  to 
white  flower  transition  which  happened  on  an  internal  branch  close  to  the  root 
of  the  phylogeny,  back  transition  on  the  branch  right  next  to  the  origin  of 
white  flowered  species  is  important.  If  this  step  is  missed,  it  will  become 
harder  and  harder  to  clean  up  white  flowered  species,  since  they  will  spread 
out  in  the  following  branching  process.  In  this  case,  cleaning  up  white 
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flowered  species  is  more  likely  to  be  a  single  hit.  For  multiple  origin  of  white 
flowered  species  on  external  branches,  the  probability  to  clean  up  white 
flowered  species  will  increase  in  fold  of  number  of  transitions  to  white.  It  will 
be  affected  more  than  the  former  case,  since  it  is  more  likely  to  be  multiple 
hits.  When  transition  rate  from  white  flower  to  pigmented  flower  increases, 
big  number  of  white  flowered  species  will  gain  more  weight.  This  will 
increase  mean  of  number  of  white  flowered  species. 

After  Miller  et  al.  (1999)  published  the  phylogeny  of  40  Ipomoea  species,  this 
analysis  has  been  expanded  by  adding  30  additional  taxa.  Repeating  the  same 
statistical  test  to  the  new  phylogeny  would  give  a  more  robust  conclusion. 
Also,  ancestral  state  reconstruction  based  on  preliminary  phylogeny  suggests 
that  there  is  a  trend  toward  white-flowered  clades  being  less  speciose  than 
pigmented  sister  clades.  However,  because  most  white  lineages  are  confined  to 
tip  branches,  the  sister  clade  of  most  species  contain  only  one  pigmented 
species  each.  A  denser  sampling  of  species  closely  related  to  these  white 
species  may  reveal  sister  clades  containing  more  than  one  species.  The 
inclusion  of  the  additional  taxa  should  allow  us  to  ascertain  whether  the 
hypothesis  of  white-flowered  clades  being  less  speciose  than  pigmented  sister 
clades  and  should  allow  a  more  reliable  comparison  of  sister-clade  diversity. 
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Fig.  2  R  vs.  transition  rate 
(Linearized  trees) 


Fig.  2.1  number  of  white  species  =5  Fig.  2.2  number  of  white  species  =10 


Fig.  2  X  axis  is  the  transition  rate  from  white  flower  to  pigmented  flower,  Y 
axis  is  the  transition  rate  from  pigmented  flower  to  white  flower,  Z  axis  is  the 
simulated  R. 
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Fig. 3  R  vs.  transition  rate 
(Most  parsimonious  trees) 


Fig.  3.1  number  of  white  species  =10  Fig.  3.2  number  of  white  species  =15 


Fig.  3.3  number  of  white  species  =20  Fig.  3.4  number  of  white  species  =25 


Fig.  3  X  axis  is  the  transition  rate  from  white  flower  to  pigmented  flower,  Y 
axis  is  the  transition  rate  from  pigmented  flower  to  white  flower,  Z  axis  is  the 
simulated  R. 
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Appendix 


1.  cuttree.c 

/*  Nei's  program  will  tell  which  species/nodes  to  cut  off. 
This  program  generates  the  new  tree  after  cutting  off.  */ 

#include  <stdlib.h> 
#include  <stdio.h> 
#include  <math.h> 

int  i, j , k, l,m, ncut , cut [10  0] ,N, tp [150] ,nodeA[100] , nodeB [100] 
int  ntp, mother , grandma; 

int  ncutspecies , cutspecies [100] , nsite , check, newN, newtp [150] 
int  ancestor [150] ; 
int  nlef t , nright ; 
char  left, right; 

char  seq [50]  [2  000]  , species [50]  [5  0]  ; 

FILE  *  inp , * input seq, *out , * out seq , *  input name ; 

main ( ) 
{ 

input ( ) ; 
calculation ( ) ; 
output  ( )  ; 

} 


input ( ) 
{ 

/*  input  parsimonious  tree  */ 

inp=f open ( "cuttree . dat " ,    "r") ; 
f  scanf  (inp,  "%d\n" ,  &N-)  ;  /*  N,   number  of  species 

k=0; 
1  =  0; 
ntp=0 ; 
do{ 

ntp++ ; 

fscanf (inp,  "%d\t" , &tp  [ntp]  )  ; 
if (tp [ntp] ==1000)  k=k+l; 
if (tp [ntp] ==2000)  1=1+1; 
}while (tp [ntp] !=5000) ; 
ntp- - ; 

if(k!=l)   printf ( "dataf ile  is  wrong. \n" ) ; 
fscanf ( inp , " \n" ) ; 
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/*  input  the  nodes/species  to  cut  off  */ 
f scanf (inp, "%d\n" , &ncut) ; 

/*  ncut ,   number  of  species/nodes  to  cut  off  */ 

for ( i=l ; i<=ncut ; i++) 
{ 

f scanf (inp, "%d\n" , &cut  [i] ) ; 

} 

f close (inp) ; 

/*   input  sequences  data  */ 

inputseq=f open (" sequence . dat " ,    "r") ; 
f scanf ( inputseq, " %d\n" , &N) ;  /*  N,   number  of 

species  */ 

f scanf ( inputseq, " %d\n" , &nsite) ;  /*  nsite,  total 

number  of  sites 

compared  among  sequences  */ 

f  or  (  i  =  1 ;  i  <  =N ;  i-M- ) 
{ 

f scanf (inputseq, "%d\n" , &m) ; 
for (j=l; j<=nsite; j++) 
{ 

seq[m]  [j  ]  =getc  (inputseq)  ; 

/*  seq [m] [j] ,    the  mth  sequence's  jth  site 

*/ 

} 

} 

f close ( inputseq) ; 

inputname=f open ( " species_name " , " r " ) ; 
f scanf (inputname, "%d\n" , &N) ; 

for (i=l;i<=N;i++) 
{ 

f scanf ( inputname ," %d  " , & j ) ; 

f scanf ( inputname , " %s\n" , species  [  j ]  )  ; 

} 

f close ( inputname) ; 

} 


calculation ( ) 
{ 

tree_structure ( )  ; 
cut_tree ( )  ; 
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} 


tree_structure ( ) 
{ 


i  =  0  ; 
k=N; 

do{ 

/*  find  the  right  side  of  the  node  */ 

do{ 

i=i+l ; 

}while (tp  [i]  !  =2000)  ;   /*  2000  indicate  right  edge  of  a 
cluster  */ 

k=k+l;  /*  k  is  the  node  #  */ 

tp[i]=k;  /*  reassign  right  edge  to  be  node# 

*/ 

nodeB [k] =tp [i-1] ; 

/*  find  the  left. side  of  the  node  */ 

j=i; 
do{ 

j=j-i; 

}while (tp [ j ] ! =1000) ;     /*   1000  indicates  left  edge  of  a 
cluster  */ 

tp[j]=k;  /*  reassign  left  edge  to  be  node# 

*/ 

nodeA [k] =tp [ j  +1]  ; 

} while ( i<ntp) ; 

for (i=l ; i<=3*N-2 ; i++) 
{ 

newtp [i] =tp  [i]  ; 

} 

for (i=N+l; i<=2*N-2 ; i++) 
{ 

k=nodeA  [i]  ; 
ancestor [k] =i ; 
k=nodeB [i] ; 
ancestor [k] =i ; 

} 

} 

cut_tree ( ) 
{ 
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} 

else 


nodeB [grandma] =nodeB [mother] 


} 

else 


nodeB [mother] =0 ; 

ancestor [nodeA [mother] ] =grandma; 
if (mother==nodeA [grandma] ) 
{ 

nodeA [grandma] =nodeA [mother] 

} 

else 

{ 

nodeB [grandma] =nodeA [mother] 

} 

} 


newN=N-ncutspecies ,- 

} 

output  (  ) 


out=f open ( "baseml_tree . dat " , " w" ) ; 
fprintf (out, "%d\t%d\n\n" ,newN, 1) ; 

left=' (' ; 
right= ' ) ' ; 
nlef t=0 ; 
nright=0 ; 

for (i=l; i<=ntp; i++) 
{ 

if (  (tp [i] >N) && (newtp  [i]  ! =0)  ) 
{ 

check=0 ; 

for  (j=l;j<i;j+  +  ) 
{ 

if (to  [  j ] =  =  tp [i] )  check=l 

} 

if  (check==0) 
{ 

fprintf (out, "%c" , left)  ; 
nlef t++; 
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} 

else 
{ 

fprintf (out, "%c" , right) ; 
nright++ ; 

} 

} 

if ( (tp [i] <=N) && (newtp [i] ! =0) ) 
{ 

if ( (tp [i-1] <=N) && (newtp [i-1]  ! =0) ) 
fprintf (out , " %c" ,  ' ,  ' )  ; 

fprintf (out , " %s" , species [tp [i]  ]  )  ; 
} 

} 

if (nlef t ! =nright )  printf ( "There  is  sth  wrong. \n"); 
f close (out ) ; 

outseq=f open ( "baseml_seq . dat " , " w" ) ; 
fprintf (outseq, " %d\t " , newN) ; 
fprintf (outseq , " %d\n" , nsite) ; 

for (i=l;i<=N;i++) 
{ 

check=0 ; 

for ( j=l; j <=ncut species ; j  +  +  ) 
{ 

if  (i  =  =  cutspecies [ j ]  )  check=l; 
} 

if (check==0) 
{ 

fprintf (outseq , " %s\n " , species [ i] ) ; 
for ( j=l; j <=nsite ; j  +  + ) 
{ 

fprintf (outseq, " %c" , seq [i] [ j ] ) ; 

} 

fprintf (outseq, " \n" ) ; 
} 

} 

f close (outseq) ; 

} 
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2 .   3 -branch . c 

/*  Change  trees  with  3  branches  at  a  single  node  into 
bifurcating  trees  */ 

#include  <stdlib.h> 
#include  <stdio.h> 
#include  <math.h> 

int  N, tp [150] ,  i,  j , k, l,nodeA[10  0] , nodeB [100] , nodeC [100] 
int  ntp , check, bad_node ; 

int  left ,  right ,  nleft  ,.nright ,  newtp  [150]  , newntp; 
char  species  [50]  [50]  ; 
FILE  *inputtp, *outtp; 

main  ( ) 
{ 

input ( ) ; 
structure ( ) ; 
output  ( )  ; 

} 


input  ( ) 
{ 

/*   input  parsimonious  tree  */ 

input tp=f open (" topology . dat " ,    "r") ; 
f  scanf  (  inputtp  ."  %d\ n  ".  &N)  ..•  /*  N.  number 

species  */ 

k=0  ; 
1  =  0; 
i  =  0  ; 
do{ 

i=i+l ; 

f scanf ( inputtp, " %d\t " , &tp [i] ) ; 

if (tp  [i] =  =  1000)  k=k+l; 

if (tp  [i] =  =  2000)    1  =  1  +  1; 
}while (tp [i] !=5000) ; 
ntp=i-l ; 

if(k!=l)    printf ( "dataf ile  is  wrong. \n"); 
fclose (inputtp) ; 

} 

structure ( ) 
{ 
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k=N; 
do{ 

/*  find  the  right  side  of  the  node  */ 

do{ 

i=i+l; 

}while (tp [i] ! =2000) ;   /*  2000  indicate  right  edge  of  a 
cluster  */ 

k=k+l;  /*  k  is  the  node  #  */ 

tp[i]=k;  /*  reassign  right  edge  to  be  node# 

*/ 

nodeB [k] =tp [i-1] ; 

/*  find  the  left  side  of  the  node  */ 

j=i; 
do{ 

j=j-i; 

}while (tp [ j ] ! =1000) ;      /*   1000   indicates  left  edge  of  a 
cluster  */ 

tp[j]=k;  /*  reassign  left  edge  to  be  node# 


*/ 


nodeA [k] =tp [ j  +1]  ; 

/*check  if  this  tree  have  3  node  case  */ 
l=j+l; 

if ( (tp [1] <=N) && (tp [1+1] ! =nodeB [k] ) ) 
{ 

nodeC [k] =tp [1+1] ; 

} 

if (tp  [1]  >N) 

{ 

do{ 

1=1+1; 

}while(tp[l] !=nodeA[k]);   /*  find  the  right  edge  of 

nodeA  * / 

if (tp  [1  +  1]  ! =nodeB [k]  )         /*  3  node  case  */ 
{ 

nodeC [k] =tp [1+1] ; 


if (nodeC [k] ! =0) 
{ 

if (nodeC [k] <=N)  1=1+1 
else 


4() 


{ 

1=1+1; 
do{ 

1=1+1; 

}while (tp [1] !=nodeC [k] ) ; 

} 

if (tp  [1  +  1]  ! =nodeB [k] ) 

printf ( "There  is  sth  wrong  for  node  %d\n",k); 

} 

}while ( i<ntp) ; 
i=N; 

check=0 ; 
do{ 

i=i+l  ; 

if (nodeC  [i]  ! =0) 
{ 

check=l ; 
bad_node=i ; 

} 

}while (i<=2*N-l) ; 

f or ( i=l ; i<=ntp ; i++) 
{ 

newtp  [ i ] =tp [i]  ; 

} 

newntp=ntp ; 

if(check==l)  new_tree(); 

} 

new_tree ( ) 
{ 

for (i  =  2*N-2 ; i>bad_node ; i -  - ) 
{ 

nodeA ( i ] =nodeA [ i - 1  ]  ; 

if (nodeA  [i] >=bad_node)    nodeA [i] ++ ; 

nodeB [ i ] =nodeB [ i - 1 ] ; 

if (nodeB  [i] >=bad_node)   nodeB [i] ++ ; 

} 

nodeA [bad_node+l] =bad_node ; 

nodeB [bad_node+l] =nodeB [bad_node] ; 

nodeB [bad_node] =nodeC [bad_node] ; 

for (i=l; i<=ntp ; i++ ) 
{ 

if (newtp  [i] >=bad_node)   newtp [i] ++ ; 

} 
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i  =  0  ; 

check=0 ; 
do{ 

i=i+l; 

if  (tp  [i] ==bad_node)  check=l; 
}while (check==0) ; 
left=i; 

f or ( j  =ntp+l ; j  >lef t ; j - - ) 
{ 

newtp [ j ] =newtp [ j - 1 ] ; 

} 

newtp [lef t+1] =bad_node ; 

k=left; 

check=0 ; 

do{ 

k  +  +; 

if (newtp [k] ==nodeC [bad_node] )  check+- 
} while (check<2 ) ; 

for ( j  =ntp  +  2 ; j  >k ; j -  - ) 
{ 

newtp [ j ] =newtp [ j - 1 ] ; 

} 

newtp [k+1] =bad_node ; 
newntp=ntp+2 ; 


} 

output ( ) 


outtp=fopen ( "n_topology . dat " ,    "w") ; 
fprintf (outtp, "%d\n" , N) ; 

lef t=1000 ; 
right=2000 ; 
nlef t  =  0  ; 
nright=0 ; 

for (i=l ; i<=newntp ; i++) 
{ 

if (newtp [i] >N) 
{ 

check=0 ; 

for  (j=l;  j<i;  j++) 
{ 

if (newtp [j ] ==newtp [i] )  check=l 

} 

if  (check==0) 
{ 

fprintf (outtp, "%d\t" , left) ; 
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nlef t++ ; 

} 

else 
{ 

f printf (outtp, " %d\t " , right ) ; 
nright++ ; 

} 

} 

if (newtp [i] <=N) 
{ 

fprintf (outtp, " %d\t " , newtp [i] ) ; 
} 

} 

fprintf (outtp, "%d\n" ,500  0) ; 
if (nlef t ! =nright ) 
{ 

printf ( "There  is  sth  wrong. \n"); 
printf ( "nlef t  =  %d\n" , nlef t)  ; 
printf ( "nright=%d\n" , nright) ; 

} 

f close (outtp)  ; 
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3 .   final-tree  .  c 


/*  read  in  branch  length  from  baseml,  generate  the  input  file 
for  sim.c  */ 

#include  <stdlib.h> 
#include  <stdio.h> 
ttinclude  <math.h> 
#include  <string.h> 

int  N, i, j ,k, l,m, newtp [150] , newntp, nbr, orderbr [150] , check; 

int  nodeA [100] , nodeB [100] ; 

int  len, nodeC [150] , bad_node ; 

float   inpbr [150] , br [150] ; 

char  tp  [1000]  , c; 

char  species [50] [50] ; 

FILE  *inp, *out , *inputtp; 

main  ( ) 
{ 

input  ( )  ; 
calculation ( ) ; 
output ( ) ; 

} 


input ( ) 
{ 

inp=fopen ( "mlb . dat " , "r" ) ; 
f scanf (inp, "%d\n\n" , &N) ; 

i  =  0  ; 
m=0  ; 
do{ 

i  +  +  ; 

tp [i] =getc (inp) ; 

if  (tp[i]=='  :') 

{ 

m++  ; 

f scanf ( inp, " %f " , &inpbr [m] ) ; 

} 

}while (tp  [i]  !  =  ';') ; 
nbr=m; 

/*  check  out  input  */ 
/* 
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printf ("%d\n\n" ,N) ; 
i  =  0  ; 
m=0  ; 
do{ 

i  +  +  ; 

printf ( "%c" , tp  [i] )  ; 
if  (tp[i]=='  :  '  ) 
{ 

m++  ; 

printf ( " %8 . St" , inpbr [m] ) ; 

} 

}while (tp [i] !=';'); 
printf ("\n") ;*/ 

f close (inp) ; 

} 


calculat  ion ( ) 
{ 


change_input  ( )  ; 
structure  ( )  ; 
branch ( ) ; 


change_input ( ) 
{ 


j=0;    /*  newtp  */ 
i=0;    /*   tp  */ 
m=0;   /*  species  */ 

do{ 
i  +  +  ; 

if  (tp[i]=='  ('  ) 
{ 

j++; 

newtp [ j ] =1000 ; 

} 

if  (tp[i]=='  )  '  ) 
{ 

j  +  +  ; 

newtp [ j ] =2000 ; 


if  ((tp[i]>='A')&&(tp[i]<='Z')) 
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m++  ; 


k=i; 

do{ 

k++; 

}while(tp[k]  !  =  ':'); 
for  (l  =  l;l<=k-i;l  +  +  ) 

{ 

len=strlen (species [m] ) ; 
species [m]  [len] =  tp  [1+i-l]  ; 

} 


3++; 

newtp [ j ] =m; 

} 

}while(tp[i]  !  =  '  ;  '  ) 
newntp=j ; 


} 

structure ( ) 


i  =  0  ; 
k=N; 
do{ 


/*   find  the  right  side  of  the  node  */ 

do{ 

i=i  +  l  ; 

}while (newtp [i] ! =2000) ;    /*   2000   indicate  right  edge  of 
a  cluster  */ 

k=k+l;  /*  k  is  the  node  #  */ 

newtp [i]=k;  /*  reassign  right  edge  to  be 

node#  */ 

nodeB [k] =newtp [i-1] ; 

/*  find  the  left  side  of  the  node  */ 


3=1; 

do{ 

j=j-i; 

}while (newtp  [j ]! =1000)  ;      /*   1000   indicates  left  edge  of 
a  cluster  */ 

newtp [j]=k;  /*  reassign  left  edge  to  be 

node#  */ 

nodeA [k] =newtp [ j +1] ; 

/*check  if  this  tree  have  3  node  case  */ 
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l=j+l; 

if ( (newtp [1] <=N) && (newtp  [1  +  1]  ! =nodeB [k] ) ) 
{ 

nodeC [k] =newtp [1+1] ; 

} 

if (newtp [1] >N) 
{ 

do{ 

1=1+1; 

}while (newtp [1] ! =nodeA [k] ) ;   /*  find  the  right  edge 

of  nodeA  */ 

if (newtp  [1  +  1]  ! =nodeB  [k] )         /*  3  node  case  */ 
{ 

nodeC [k] =newtp [1+1] ; 

} 

} 

if (nodeC [k] ! =0) 
{ 

if (nodeC [k] <=N)  1=1+1; 

else 

{ 

1=1+1; 

do{ 

1=1+1; 

}while (newtp [1] ! =nodeC [k] ) ; 

} 

if (newtp [1+1] ! =nodeB [k] ) 

printf ( "There  is  sth  wrong  for  node  %d\n",k); 

} 

} while ( i<newntp) ; 

/*for (i=l; i<=newntp; i++) 
{ 

printf ( " %d\t " , newtp [i] ) ; 
W 


} 

branch ( ) 
{ 

m=0  ; 

for (i=l ; i<=newntp; i++) 
{ 

if (newtp [i] <=N) 
{ 
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m++  ; 

orderbr [m]  =i  ; 
} 

if (newtp [i] >N) 
{ 

check=0 ; 

for  (j=i-l;  j>  =  l;  j--) 

{ 

if (newtp [j ] ==newtp [i] )  check=l; 

} 

if (check==l) 

{ 

m++  ; 

orderbr [m]  =i ; 

} 

} 

} 

for (i=l; i<=nbr ; i++ ) 
{ 

br [newtp  [orderbr [i] ] ] =inpbr [i]  ; 

} 

/* 

for (i=l ; i<=newntp ; i  +  + ) 
{ 

if (newtp [i] <=N) 
{ 

print f("%s:%8 .  6f "  , species [newtp [i]  ]  , br [newtp [i]  ]  )  ; 
} 

if (newtp [i] >N) 
{ 

check=0 ; 

for (j=i-l; j >=1; j --) 
{ 

if (newtp [j ] ==newtp [i]  )    check  =  l; 

} 

if ( check==l ) 
{ 

print f ( " %c : %8 . 6f " ,  ' )  '  , br [newtp [i] ]  )  ; 

} 

if (check==0) 
{ 

printf ( " %c" , ' ( ' ) ; 

} 

} 

}*/ 

} 

output ( ) 

{ 
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out=f open ( " sim . dat " , "w" ) ; 
fprintf (out , " %d  sequences\n" , N) ; 

for (i=l; i<=N; i++) 
{ 

fprintf (out , " %d" ,  i)  ; 

fprintf (out , "   %s\n" , species  [i] ) ; 

} 

check=0 ; 

for  (i=N+l;i<=2*N-l;i  +  +) 
{ 

if (nodeC [i] ! =0) 

{ 

check=l ; 
bad_node=i ; 
} 

} 

fprintf (out, "bad_node\t%d\n" , bad_node)  ; 

for (i=N+l ; i<=2*N-3 ; i++) 
{ 

fprintf (out ," %3d     and     %3d" , i , nodeA [i]  )  ; 
fprintf (out , "         %lf \n" , br [nodeA [i] ] ) ; 

fprintf (out ," %3d     and     %3d" , i , nodeB [i]  )  ; 
fprintf (out ,  "  %lf \n" , br [nodeB [i] ]  )  ; 

i  f ( i  =  =bad_node ) 
{ 

fprintf (out ," %3d     and     %3d" , i , nodeC [i]  )  ; 
fprintf (out ,  "         %lf \n" , br [nodeC [i]  ]  )  ; 
} 

} 

if (bad_node==0) 
{ 


else 


i=2*N-2 ; 

fprintf (out , 

•%3d 

and 

%3d" , i , nodeA  [i] )  ; 

fprintf (out , 

%lf \n" 

, br [nodeA [i] 

]  )  ; 

fprintf (out , 

•%3d 

and 

%3d" , i , nodeB 

[i]  )  ; 

fprintf  (out , 

%lf \n" 

, br [nodeB [i] 

]  )  ; 

fprintf (out , 

•%3d 

and 

l\t  %lf\n" 

,  i,br 

[l] 

i=2*N-3 ; 

fprintf (out , 

■%3d 

and 

l\t  %lf\n" 

,  i,br 

[l]  ) 

} 

f close (out ) ; 

} 
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4 .    sim . c 


/*  simulate  transition  between  purple  flower  and  white  flower, 
find  the  final  distribution  of 

purple  (-1)    flower  species  and  white (1)    flower  species*/ 

#include  <stdlib.h> 
^include  <stdio.h> 
#include  <math.h> 

#define  step  40 

int  i  ,  j , k, 1 , m, w, newN, nodeA [100] , nodeB [10  0] , nodeC [100] ; 

int  ancestor [100] , color [100] ; 

int 

check , seed, nw [5  0  00 ]  , newseed, runtime , nmutation, mutation [10  0]  ; 
int  bad_node, wp, pw; 

int  order  [5000]  , y  [10  0]  , printposition, totaly, count [100]  ; 
int  biggerthanR  [50]  [50]  ; 
float  totalR  [50]  [50]  ,- 
double 

branch[150] , rate , p_wrate , w_prate , t , p , sample , ttw [ 5 0 00 ] ,R[5000] ; 
double  minimum_x, minimum_y, maximum_x, maximum_y ; 
double  step_x, orderR  [5000]  ,minimum_R; 
double  x[100]  , percenty  [100]  ,ordery[100]  ; 
char  species [50]  [50]  , string [10]  ; 

FILE   *inp, *out ; 

main ( ) 
{ 

input  ( )  ; 

for (wp=0 ;wp<  =  45 ;  wp++) 
{ 

w_prate= (float) (wp)/100; 
for (pw=l ; pw<=4  5 ; pw++ ) 
{ 

p_wrate= (float)  (pw)/100; 
totalR [wp] [pw] =0 ; 

printf ( " w_prate=%f \tp_wrate=%f \n" , w_prate , p_wrate) ; 
for (runt i me  =  1 ; runt  ime<  =  1000  ; runt  ime  +  +  ) 

{ 

calculation ( ) ; 

} 

record ( ) ; 
} 

} 

output ( ) ; 

} 

input ( ) 
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printf ("Seed  for  random  number  generator   (samller  than 
digits) :    \n" ) ; 

scanf("%6d",    &seed) ; 
srand48 (seed) ; 

inp=f open ( " sim . dat " , "r " ) ; 
f scanf ( inp , " %d  sequences\n" , &newN) ; 

for (i=l ; i<=newN; i++) 
{ 

f scanf ( inp , " %d" , &j ) ; 

f scanf ( inp, "   %s\n" , species  [j ] )  ; 

} 

f scanf (inp, "%s\t%d\n" , string, &bad_node) ; 

for (i=newN+l ; i<=2*newN-3 ; i++) 
{ 

f scanf ( inp ," %3d     and     %3d" , &j , &nodeA [i] ) ; 
f scanf ( inp , "         %lf \n" , Scbranch  [nodeA [ j ]  ]  )  ,- 

f scanf ( inp ," %3d     and     %3d" , &j , &nodeB [i]  )  ; 
f scanf (inp,  "         %lf \n" ,  Scbranch [nodeB [j  ]  ]  )  ; 

i  f ( i  = =bad_node ) 
{ 

f scanf (inp, "%3d     and     %3d" , &j , &nodeC [i]  )  ; 
f scanf ( inp,  "         %lf \n" , &branch [nodeC [ j  ]  ]  )  ; 


if (bad_node==0) 
{ 

i=2*newN-2 ; 
f scanf ( inp , 
f scanf ( inp , 
f scanf ( inp , 
f scanf ( inp , 
f scanf (inp, 


%3d     and     %3d"  ,  &  j  ,  ScnodeA  [  i  ]  )  ; 

%lf  \n"  ,  Scbranch  [nodeA  [j  ]  ]  )  ; 
%3d     and     %3d" , &j , &nodeB [ j ] ) ; 

%lf  \n"  ,  Scbranch  [nodeB  [  j  ]  ]  )  ; 
%3d     and         l\t   %lf\n"  , &j  ,  Scbranch  [1]  ) 


else 
{ 

i=2*newN-3 ; 

f  scanf  (inp,  "%3d     and         l\t   %lf\n"  ,  Scj  ,  Scbranch  [1]  ) 

} 

f close ( inp) ; 


calculation ( ) 
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if (bad_node==0) 
{ 

for ( i=newN+l ; i<=2*newN-2 ; i++ ) 
{ 

k=nodeA [i]  ; 
ancestor [k] =i ; 
k=nodeB [i] ; 
ancestor  [k] =i ; 
} 

m=2*newN-2 ; 
color [m] =  - 1 ; 

for (i=l ; i<=2*newN-3 ; i++) 
{ 

color  [i] =0 ; 
} 

nmutation=0 ; 
do{ 

k=nodeA [m] ; 

t=branch  [k]  ,- 
if  (color [m] =  =  -1) 
{ 

p=w_prate-p_wrate*exp ( - 1* ( w_prate+p_wrate ) *t ) ; 
p=p/ (w_prate+p_wrate) ,- 
sample=drand48 ()  ; 
if (sample>p) 
{ 

color  [k] =1 ; 

nmut at  ion=nmutat  ion+ 1 ; 
mutation [nmutation] =k; 

} 

else 
{ 

color [k] =-1 ; 

} 

} 

if (color [m] ==1) 
{ 

p=p_wrate+w_prate*exp ( - 1  * (w_prate+p_wrate) *t)  ; 
p=p/ (w_prate+p_wrate) ; 
sample=drand48 () ; 
if (sample>p) 
{ 

color [k] =-1 ; 
nmutation=nmutation+l ; 
mutation  [nmutation] =k; 

} 

else 
{ 
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color [k] =1 ; 

} 

} 

m=k; 
}while (m>newN) ; 

do{ 

k=ancestor [m] ; 
i  f  ( k ! =bad_node ) 
{ 

if (color [nodeA [k] ] ==0 )   down ( ) ; 
else 
{ 

if (color [nodeB [k] ] ==0)   down ( ) ; 

} 

} 

else 
{ 

if (color [nodeA [k] ] ==0)    down ( ) ; 
else 
{ 

if (color [nodeB  [k] ] =  =  0)   down ( )  ; 

else 

{ 

if (color [nodeC  [k] ] =  =  0)    down ( )  ; 

} 

} 

} 

m=k  ; 

}while (m<2*newN-4 ) ; 

} 

/*  check  */ 
check=0 ; 
1  =  1; 

if (bad_node==0) 
{ 

do{ 

1=1+1; 

if  (color [1] =  =  0)  check=l; 
jwhile (l<2*newN-2  check==0); 

} 

else 
{ 

do{ 

1=1+1; 

if  (color  [1] =  =  0)  check=l; 
}while (l<2*newN-3   &&  check==0); 

} 

if (check==l) 


53 


{ 

printf (" calculation  is  not  completed . \n" ) ; 
printf (" species  %d  is  missing . \n" , 1 ) ; 

} 

statistics  ( )  ; 

} 

down ( ) 
{ 

do{ 

if (color [nodeA[k] ] ==0)   l=nodeA[k] ; 
else 
{ 

if (color [nodeB [k] ] ==0)    l=nodeB [k] ; 
else 

{ 

if ( (k==bad_node) (color [nodeC [k] ] ==0) ) 

l=nodeC [k] ; 

else  printf ("sth  is  wrong  at  node  %d\n" 

} 

} 

t=branch  [1]  ; 

if  (color [k] =  =  -1) 

{ 

p=w_prate-p_wrate*exp (-1* (w_prate+p_wrate) *t) 
p=p/ (w_prate+p_wrate) ; 
sample=drand4  8 () ; 
if ( sample>p) 
{ 

color [1] =1 ; 

nmutation=nmutation+l ; 
mutation  [nmutation] =1 ; 

} 

else 
{ 

color [1] =-1 ; 

} 

} 

if  (color  [k] =  =  1) 
{ 

p=p_wrate+w_prate*exp ( - 1* ( w_prate+p_wrate )  *t ) 
p=p/ (w_prate+p_wrate) ,- 
sample=drand4  8 ()  ; 
if ( sample>p) 
{ 

color  [1] =  -1 ; 
nmutation=nmutation+l ; 
mutation [nmutation] =1 ; 

} 


54 


else 
{ 

color [1] =1 . 


k=l  ; 
}while (k>newN) 


statistics ( ) 
{ 


for (i=l ; i<=2 *newN-l ; i++) 
{ 

count [ i ] =  0 ; 

} 


ttw  [runtime] =0 ; 
nw [ runt  ime ] =  0 ; 

for ( i=l ; i<=newN; i++ ) 
{ 

if  (color [i] =  =  1) 
{ 

nw [runtime] =nw [runtime] +1 ; 

w=i  ; 

do{ 

ttw [runtime] =ttw [runtime] +branch [w] ; 

count [w] =1 ; 

w=ancestor [w] ; 
}while ( (color [w] ==1) && (count [w] ==0) ) ; 
} 

} 

if (ttw [runtime] ==0)   R [runtime] =0 ; 

else  R [runtime] =nw [runtime] /ttw [runtime] ; 

totalR[wp]  [pw] =totalR [wp] [pw] +R [runt ime] ; 

} 


record ( ) 
{ 

f or (i=l ; i<=12 ; i++) 
{ 

y [i] =0; 

} 


for  (i  =  l;i<  =  1000;i  +  +  ) 
{ 

if(R[i]<30)  y[l]++; 

if(R[i]>80)  y[12]++; 

if (  (R  [i] <  =  80) && (R  [i] >  =  3  0)  ) 

{ 
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j  =  (int)  (  (R[i]  -30)  /5)  +2  ; 

y  [j]  ++; 

} 

} 

} 

output ( ) 

{ 

out  =  f open ( " f  inal . out " ,  " w" )  ; 
for (wp=l ; wp<=2  0 ;wp++) 
{ 

for (pw=l ;pw<=20 ;pw++) 
{ 

f print f (out , "%f \t " , totalR [wp] [pw] /100  0) ; 
} 

fprintf (out , " \n" ) ; 

} 

f close  (out )  ; 

} 
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5 .   probRS . c 

/*  get  joint  probability  of  R  and  S,   based  on  all  parsimony- 
trees  */ 

#include  <stdlib.h> 
#include  <stdio.h> 
#include  <math.h> 

#define  step  20 

int  i,j,k,l,m,w, newN , nodeA [100] , nodeB [100] , nodeC [100] ; 

int  ancestor [100] , color [100] ; 

int 

check, seed, nw [6000]  , newseed, runt ime , nmutat ion, mutation [100]  ; 
int  bad_node , wp , pw ; 

int  count  [15  0]  , nwhite [50]  [45]  [45]  ; 

int   nR [110]  [50]  [4  0]  [40]  ; 

int   len,nf ,nfile, ct  [110]  [50]  [45]  [45]  ; 

float  prob_R_given_S [110] [50] [40] [40] , prob_S [50] [40] [40]; 

float   joint_prob_R_S [110] [50] [45] [45] ; 

float  probRS  [110]  [50]  [45]  [45]  ,del; 

float  boarder , boarder2 , sum [40] [40] ; 

double 

branch[150] , rate , p_wrate , w_prate , t , p , sample , ttw [ 60 0 0 ] ,R[6000] ; 

char  species  [50]  [50]  , string [10]  ; 

char 

filename  [3  0]  =  {  '  R'  ,  '_'  ,  '  S'  ,  '_'  ,  'W  ,  '  p'  ,  '_'  ,  '  p'  ,  '  w'  ,  '_'  ,  '  \  0'  }  ; 
char  add  [15] = { ' 0 ' , ' 1 ' , ' 2 ' , ' 3 ' , ' 4 ' , ' 5 ' , ' 6 ' , ' 7 ' , ' 8 ' , ' 9 '  ,  ' \0  '  }  ; 
char  newf ilename [3  0] ; 
char  file  [80]  [50]  ; 

FILE  *inp,*out; 

main  ( ) 
{ 

getf ile  ( )  ; 
set_zero ( ) ; 

for  (nf  =  l;nf<=nf  ile,-nf  +  +) 
{ 

printf ( "computing  tree  %d\n",nf); 
input ( ) ; 

for (wp=0 ;wp<=40 ;wp++) 
{ 

w_prate= (float) (wp)/100; 
for (pw=l ;pw<=40 ;pw++ ) 
{ 

p_wrate= (float ) (pw)/100; 

printf ( "w_prate=%f \tp_wrate=%f \n" , w_prate , p_wrate) ; 
setup  ( )  ; 

f or ( runt ime =1 ; runt ime< =1000 ; runt ime++ ) 
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calculation ( ) 
} 

prob ( ) ; 


output ( ) 

} 

getf ile () 


printf ("Seed  for  random  number  generator   (samller  than  6 
digits) :    \n" ) ; 

scanf ( " %6d" ,    &seed) ; 
srand48 (seed) ; 

inp=f open ( " tree_input_f iles " , " r " ) ; 
f scanf ( inp, " %d\n" , Unfile) ; 

f or (i=l ; i<=nf ile; i++) 
{ 

f scanf ( inp ,  " %s\n" , f ile  [  i]  )  ; 

} 

f close (inp) ; 


set_zero ( ) 
{ 


for (wp=0 ;wp<=40 ; wp++) 
{ 

for (pw=l ;pw<=40 ;pw++) 

{ 

for ( i=l ; i<=newN; i++) 
{ 

for (j=0;j<=100;j++) 
{ 

Ct  [j]  [i]  [wp]  [pw]  =0; 
probRS[j]  [i]  [wp]  [pw]=0, 


} 
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input ( ) 
{ 


inp=f open (file  [nf ] , "r" )  ; 
fscanf (inp, "%d  sequences\n" , &newN) ,- 

for (i=l; i<=newN; 

{ 

fscanf (inp, " %d" , & j ) ; 

fscanf ( inp , "   %s\n" , species  [ j ] )  ; 

} 

fscanf (inp, "%s\t%d\n" , string, &bad_node) ; 

for ( i=newN+l ; i<=2  *newN- 3 ; i  +  + ) 
{ 

fscanf (inp, "%3d     and     %3d" , &j , SnodeA [i] ) ; 
fscanf ( inp , "         %lf \n" , Sbranch [nodeA [  j  ]  ]  )  ; 

fscanf (inp, "%3d     and     %3d" , &j , knodeB  [i] )  ; 
fscanf (inp, "         %lf \n" , Sbranch [nodeB [  j  ]  ]  )  ,- 

if ( i==bad_node) 
{ 

fscanf (inp, "%3d     and     %3d" , &j , SnodeC [i]  )  ; 
fscanf ( inp,  "         %lf \n" , &branch  [nodeC [ j ] ]  )  ; 
} 

} 

if (bad_node==0 ) 
{ 

i=2*newN-2 ; 

f scanf ( inp , " %3d     and     %3d" , &j , &nodeA [ i ] ) ; 

fscanf ( inp ,  "         %1 f \n" , ^branch [nodeA [ j ] ]  )  ; 

fscanf ( inp,  " %3d     and     %3d" , & j , toodeB [ j ]  )  ; 

f scanf ( inp , "         %lf \n" , &branch [nodeB [ j ] ] ) ; 

f scanf ( inp ," %3d     and         l\t  %lf \n" , &j , &branch [1] ) ; 

} 

else 
{ 

i  =  2  *newN-3 ; 

fscanf (inp,  "%3d     and         l\t   %lf \n" , &j , &branch [1]  )  ; 

} 

f close ( inp) ; 

} 

setup ( ) 
{ 

for  (  i  =  l ,-  i<=newN;  i  +  +  ) 
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{ 

nwhite[i]  [wp]  [pw]=0; 
for ( j  =0 ; j  <  =  100 ; j  ++) 
{ 

nR[j]  [i]  [wp]  [pw]  =0; 

} 

} 

} 

calculation ( ) 
{ 

if  (bad_node=  =  0) 
{ 

for ( i=newN+l ; i<  =  2  *newN-2 ; i  +  +  ) 
{ 

k=nodeA  [  i  ]  ,- 
ancestor [k] =i ; 
k=nodeB  [i]  ; 
ancestor [k] =i ; 
} 

m=2*newN-2 ; 
color [m] =- 1 ; 

f or ( i  =  l ; i<  =  2  *newN-3 ; i  +  + ) 
{ 

color  [i] =0 ; 
} 

nmutation=0  ,- 
do{ 

k=nodeA  [m]  ; 
t=branch [k] ; 
if (color  [m] =  =  -1) 
{ 

p=w_prate-p_wrate*exp ( - 1  * ( w_prate+p_wrate ) * t )  ; 
p=p/  (w_prate+p_wrate)  ,- 
sample=drand48 () ; 
if (sample>p) 
{ 

color [k] =1 ; 

nmutation=nmutation+l ; 
mutation [nmutation] =k; 

} 

else 
{ 

color [k] =-1; 

} 

} 

if (color [m] ==1) 
{ 


p=p_wrate+w_prate*exp ( -1* (w_prate+p_wrate) 
p=p/ (w_prate+p_wrate) ; 
sample=drand4  8 () ; 
if (sample>p) 
{ 

color [k] =-1 ; 
nmutation=nmutation+l ; 
mutation [nmutation] =k ; 

} 

else 
{ 

color [k] =1 ; 

} 

} 

m=k  ; 
}while (m>newN) ; 

do{ 

k=ancestor [m] ; 
i  f ( k ! =bad_node ) 
{ 

if (color [nodeA [k] ] ==0)    down ( ) ; 
else 

{ 

if (color  [nodeB [k] ] =  =  0)    down ( )  ; 

} 

} 

else 
{ 

if (color [nodeA [k] ] ==0 )   down ( ) ; 
else 
{ 

if (color  [nodeB  [k] ] =  =  0)    down ( )  ; 

else 

{ 

if  (color  [nodeC [k] ] =  =  0)    down ( )  ; 

} 

} 

} 

m=k  ; 

}while (m<2 *newN-4 ) ; 

} 

/*  check  */ 
check=0 ; 
1  =  1; 

if (bad_node==0 ) 
{ 

do{ 

1=1+1 ; 

if (color  [1] =  =  0)   check= 1; 


hi 


}while  (l<2*newN-2  &&  check=  =  0); 

} 

else 
{ 

do{ 

1=1+1; 

if (color [1] ==0)  check=l; 
}while (l<2*newN-3  &&  check==0) ; 

} 

if (check==l) 
{ 

printf ( "calculation  is  not  completed . \n" )  ; 
printf (" species  %d  is  missing . \n" , 1 ) ; 

} 

statistics  ( )  ; 

} 

down ( ) 
{ 

do{ 

if (color [nodeA [k] ] ==0)    l=nodeA [k] ; 
else 
{ 

if (color  [nodeB  [k] ] =  =  0)    l=nodeB [k]  ; 

else 

{ 

if (  (k==bad_node) (color [nodeC [k] ] =  =  0)  ) 

l=nodeC [k] ; 

else  printf ("sth  is  wrong  at  node  %d\n" 

} 

} 

t=branch [1] ; 

if (color  [k] =  =  -1) 

{ 

p=w_prate-p_wrate*exp (-1* (w_prate+p_wrate)  *t) 
p=p/ (w_prate+p_wrate)  ; 
sample=drand4  8 ()  ; 
if (sample>p) 
{ 

color  [1] =1; 

nmutation=nmutation+l ; 
mutation  [nmutation] =1 ; 

} 

else 
{ 

color [1] =-1; 

} 

} 
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if  (color [k] =  =  1) 
{ 

p=p_wrate+w_prate*exp ( -1* (w_prate+p_wrate) *t) ; 
p=p/ (w_prate+p_wrate) ; 
sample=drand48 () ; 
if (sample>p) 

{ 

color  [1] =  -1; 
nmutation=nmutation+l ; 
mutation [nmutation] =1 ; 

} 

else 
{ 

color [1] =1; 

} 

} 

k=l; 
} while (k>newN) ; 

} 

statistics ( ) 
{ 

f or (i  =  l ; i<  =  2  *newN- 1 ; i  +  + ) 
{ 

count  [i] =0 ; 

} 

ttw [runtime] =0 ; 
nw [runt  ime] =0 ; 

for (i=l; i<=newN; i++) 
{ 

if (color  [i] =  =  1) 
{ 

nw [ runt  ime ] =nw [ runt  ime ] + 1 ; 

w=i  ; 

do{ 

ttw [runtime] =ttw [runtime] +branch [w] ; 

count [w] =1 ; 

w=ancestor [w] ; 
}while (  (color [w] =  =  1) && (count  [w] =  =  0)  )  ; 
} 

} 

if (ttw  [runtime]  ! =0)  R [runtime] =nw  [runtime] /ttw [runtime]  ; 
else  R [runtime] =0 ; 

nwhite [nw [runtime] ] [wp] [pw]++; 
if (  (int) R [runtime] <100) 
nR  [( int ) R [runtime] ]  [nw [runtime] ]  [wp]  [pw]++; 
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prob  ( ) 


for (i=l ; i<=newN; i++) 
{ 

prob_S[i]  [wp] [pw] = (float) nwhite [i]  [wp] [pw] /runtime ; 
for ( j  =0 ; j  <  =  100 ; j  ++) 
{ 

if (nwhite  [i]  [wp]  [pw]  !=0) 
{ 

prob_R_given_S  [ j ]  [i]  [wp]  [pw]  =  (float ) nR [j ]  [i]  [wp]  [pw] /nwhite [i]  [ 
wp]  [pw]  ; 

j oint_prob_R_S [ j ]  [i]  [wp]  [pw] =prob_R_given_S [ j ]  [i]  [wp]  [pw] *prob_ 
S  [i]  [wp]  [pw]  ; 

ct  [j]  [i]  [wp]  [pw]  ++; 
del  =  j oint_prob_R_S  [  j  ]  [i]  [wp]  [pw] - 
probRS  [j  ]  [i]  [wp]  [pw]  ; 

probRSfj]  [i]  [wp]  [pw]  =probRS  [j]  [i]  [wp]  [pw] +del/ct  [  j  ]  [i]  [wp]  [pw]  ; 


/*test*/ 
/* 

boarder  =  0  ,- 
for (i=l ; i<=newN; i++) 
{ 

for ( j=0; j<=100; j++) 
{ 

boarder=boarder  +  j oint_prob_R_S [ j ]  [i]  [wp]  [pw] 


print f ( "boarder=%f \n" , boarder) ; * / 


output ( ) 
{ 


for (wp=0 ;wp<=4  0 ;wp++) 
{ 

for (pw=l ;pw<=40 ;pw++) 
{ 

sum  [wp]  [pw] =0 ; 

for  (i  =  l ; i<=newN; i  +  +  ) 
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for (j=0; j<=100; j++) 
{ 

sum[wp]  [pw]=sum[wp]  [pw] +probRS  [  j  ]  [i]  [wp]  [pw]  ; 
} 

} 

/*printf ( "total  prob  of  wp(%d)  pw(%d) 
=  %f  \n" ,  wp,  pw,  sum  [wp]  [pw]),-*/ 
} 

} 

for (wp=0 ;wp<=4  0 ;wp++) 
{ 

for (pw=l ;pw<=40 ;pw++) 
{ 

f or ( i=l ; i<=newN; i++) 
{ 

for (j=0; j<=100; j++) 
{ 

probRS[j]  [i]  [wp]  [pw] =probRS [ j ]  [i]  [wp]  [pw] /sum [wp]  [pw]  ; 
} 

} 

} 

} 

for  (i  =  0 ; i<  =  8 ; i  +  +) 
{ 

wp=i  *  5 ; 

for (j=l; j<=8; j++) 
{ 

pw=j*5; 

strcpy (newfilename , filename) • 
len=strlen (newf ilename) ; 
k=wp/10 ; 

newf ilename [len] =add [k] ; 
l=wp%10 ; 

newf ilename [len+1] =add [1]  ; 
newfilename [len+2]  ='_'  ; 
k=pw/10 ; 

newf ilename [len+3 ] =add [k]  ; 
l=pw%10 ; 

newf ilename [len+4] =add [1]  ; 
newfilename [len+5] =' \0 ' ; 
printf ( " %s\n" , newfilename) ; 
write  ( )  ; 

} 


write ( ) 
{ 
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out =f open (newfilename , "w") ; 
for (k=l ; k<=newN; k++ ) 
{ 

for (1=0;1<=100;1++) 
{ 

fprintf (out, " %f \t " , probRS [1]  [k]  [wp]  [pw] )  ■ 
} 

fprintf (out , " \n" ) ; 

} 

f close (out ) ; 

} 
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6 .   contour . c 

/*  find  the  area  which  covers  95%  of  all  simulate    (R,S)  */ 

^include  <stdlib.h> 
#include  <stdio.h> 
#include  <math.h> 

int 

i , j ,  k, bottom [5  0] , up [50] , cut , R, S, smallest S , biggest S , bad_S , check; 
int  wp, pw, len, 1 , w_p, p_w; 

float  probRS[210] [50] , newprobRS [210] [50]; 

float  min, total_prob ; 

char 

filename  [3  0]  =  {  'R'  ,  '_'  ,  'S'  ,  '_'  ,  'W  ,  'p'  ,  '_'  ,  'p'  ,  'w'  ,  '_'  ,  '  \  0'  }  ; 
char  add [15] ={'0','1','2','3','4','5','6','7','8','9','\0'}; 
char  inputf ile  [30]  ; 

char  writefile[30]={'c' , ' o' , 'n' , ' \0' }  ; 
char  outputf ile [30] ; 

FILE  *inp,*out; 

main  ( ) 
{ 

for (w_p=0 ; w_p<=9 ;w_p++) 

{ 

wp=w_p*5 ; 

for (p_w=l ;p_w<=9 ;p_w++) 
{ 

pw=p_w*5 ; 

strcpy ( inputf ile , filename) ; 
len=strlen (inputf ile) ; 
k=wp/10 ; 

inputf ile [len] =add [k] ; 
l=wp%10 ; 

inputf ile [len+1] =add  [1]  ; 
inputf ile [len+2 ] = ' _' ; 
k=pw/10 ; 

inputf ile  [len+3 ] =add  [k]  ; 
l=pw%10 ; 

inputf ile [len+4] =add  [1]  ; 
inputf ile [len+5] = ' \0 ' ; 
printf ( " %s\n" , inputf ile) ; 

input ( ) ; 
s  e  t  up  ( )  ; 
total_prob=l  ; 
do{ 

boarder ( ) ; 

cut_of f  ( )  ; 

find_bad_S () ; 
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/*printf ( "total_prob=%f \n" , total_prob) ; */ 
}while (total_prob>0 . 95 ) ; 

strcpy (outputf ile , writef ile) ; 
len=strlen (outputf ile) ; 
k=wp/10 ; 

outputf ile [len] =add [k] ; 
l=wp%10 ; 

outputf ile [len+1] =add [1] ; 
k=pw/10; 

outputf ile [len+2] =add [k] ; 
l=pw%10 ; 

outputf ile [len+3] =add [1] ; 
outputf ile  [len+4] =' \0 '  ; 
printf (" %s\n" , outputf ile) ; 

output ( ) ; 
} 

} 

} 

input ( ) 
{ 

inp=f open ( inputf ile ,  "r")  ; 
for  (i  =  l ; i<=42 ; i  +  +  ) 
{ 

for (j=0; j<=200; j++) 
{ 

f scanf (inp, " %f \t " , &probRS [ j ] [i] ) ; 
} 

f scanf ( inp , " \n" ) ; 

} 

f close ( inp) ; 

} 

setup ( ) 
{ 

for (i  =  l ; i<  =  42  ;  i  +  +  ) 

{ 

for (j=0; j<=200; j++) 
{ 

newprobRS[j]  [i]  =probRS  [ j  ]  [i]  ; 
} 

} 

} 
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boarder ( ) 
{ 


for  (i=l; i<  =  42 ; i  +  +  ) 
{ 

j=0; 
do{ 

}  while  (  (newprobRS  [j][i]=  =  0)&&(j<  =  200)); 
if  ( j<  =  200) 
{ 

bottom [i] =j ; 

j=201; 

do{ 

j  -  -  ; 

}while  (newprobRS  [j  ]  [i]==0); 

up  [i] =j  ; 

} 

else 
{ 

bottom [i] =0 ; 

up  [i] =0  ; 

} 

/*printf ( "bottom [%d] =%d\tup [%d] =%d\n" , i , bottom [i]  , i , up  [i]  )  ; */ 
} 

} 


cut_of f ( ) 

{ 

min=2 ; 

for  (i  =  l ; i<  =  42 ; i  +  +) 
{ 

if  (bottom [i] +up [i]  !=0) 
{ 

if (min>newprobRS [bottom [i] ] [i] ) 
{ 

min=newprobRS [bottom [i] ] [i] ; 
cut =bot torn [i] *100+i ; 

} 

if (min>newprobRS [up [i] ] [i] ) 
{ 

min=newprobRS [up [i] ]  [i]  ; 
cut=up [i] *10  0+i ; 

} 

} 

} 

R=CUt/100 ; 
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S=CUt%100 ; 
newprobRS  [R]  [S]=0; 
total_prob=total_prob-min; 
/*printf ( "cut  off  prob[%d] [%d] 
%f \ttotal_prob=%f \n" , R, S, probRS [R] [S] , total_prob) ; */ 

} 

f  ind_bad_S  () 
{ 

do{ 

j=0; 

do{ 

} while (bottom [ j ] +up [ j ] ==0 ) ; 

smallestS=j ; 

j=43; 

do{ 

j--; 

} while (bottom [ j ] +up [ j ] ==0 ) ; 
biggestS=j ; 

check=0 ; 

j  =smallestS ; 

do{ 

if (bottom [j ] +up [ j ] ==0) 
{ 

check=l ; 
bad_S=j ; 
} 

} while (  (check=  =  0) && ( j  <biggestS)  )  ; 

if (check==l) 
{ 

if (bad_S-smallestS<biggestS-bad_S) 
{ 

for ( i=smallestS ; i<=bad_S ; i++ ) 
{ 

f or (k=bottom [i] ;k<=up[i] ;k++) 
{ 

newprobRS [k]  [i]=0; 

total_prob=total_prob-newprobRS [k] [i] ; 

} 

bottom [i] =0 ; 
up  [i] =0  ; 

} 

} 

else 
{ 
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for ( i=biggestS ; i>=bad_S ; i- -  ) 
{ 

for (k=bottom [i] ;k<=up[i] ;k++) 
{ 

newprobRS  [k]  [i]=0; 

total_prob=total_prob-newprobRS [k]  [i]  ; 

} 

bottom [i] =0 ; 
up  [i] =0; 

} 


/*printf ( "smallestS=%d\tbiggestS=%d\tbad_S=%d\n" , smallestS ,  bigg 
estS,bad_S) ;*/ 

}while (check==l) ; 


output ( ) 
{ 


out =f open (output file , "w" ) ; 
fprintf (out, "%d\t\t%d\n" ,9,63) ; 
for  (i  =  l ; i<  =  42 ; i  +  +  ) 

{ 

if (bottom [i] ! =0) 

fprintf (out , " %d\t%d\n" , i , bottom [ i] ) ; 

} 

for (i=42 ;i>=l;i--) 
{ 

if  (bottom [i]  ! =0) 

fprintf (out , " %d\t%d\n" , i , up [i] ) ; 

} 

f close (out) ; 
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453  R8\n-)y 


